--- title: 'Designing a Route Finder App Using Oregon Climbing Data' author: "Nina" date: '2022-05-30' output: html_document: df_print: paged tags: - R Markdown - plot - regression categories: - R - Visualizations ---

Introduction

     The population of climbers is exploding and we need more and better access to data to make the sport more accessible. I am using data provided from OpenBeta, a nonprofit built and run by climbers that enables “open access and innovative uses of climbing data” (1). I built a Shiny app that includes a map of sport and trad climbing routes in Oregon ranked by route quality and a recommendation engine tailored to users of any skill level.
     Following the introduction of climbing as a sport at the 2021 Tokyo Olympics, and the success of films such as Free Solo featuring Alex Honnold (2018) and The Dawn Wall starring Tommy Caldwell (2018), the industry has seen, and continues to see historic growth and opportunities for new, profitable markets. According to Forbes, Google searches that included the term “climbing” reached an all time high in the first week of August 2021; the same time frame that men’s and women’s combined events were held in Tokyo (2). Not only is the sport gaining a bigger audience, but it is also attracting regular people like you and me to take a crack at the crag. Following the pandemic, nearly 100 climbing gyms have opened in North America and profits of El Cap, one of America’s largest operators of climbing facilities, saw a 100% increase in online interactions (2).
     One concern with the sport’s booming popularity is the barrier to entry (e.g. where to find routes, at what level to begin, the fear of picking routes at the wrong level, etc.) and, as a result, there has been a push to make the sport more accessible. In addition, there are only a few databases with outdoor climbing routes that are accessible to the public. The Mountain Project or 8A are serviceable, but the sport needs more. Without these additional platforms, climbers who are looking to hit their local crag or boulder may not be able to find routes or learn about the quality of them if they do not already have community or word of mouth. While these websites have provided helpful tools to climbers of all experience and skill levels, they are still sorely lacking data. And scrapings of these platforms have resulted in DMCA take downs and/or lawsuits (3). At the bare minimum, we need better and easier access to climbing data so that data scientists like myself can work to advance the sport for others. As the sport grows so will the influx of data, and with any field that is expanding and rapidly changing, data science can add value by helping climbers and stakeholders make better-informed decisions, thereby generating new insights about its players and audience, and increasing the overall experience for users.
     OpenBeta is a non-profit built and run by climbers that enables “open access and innovative uses of climbing data” (1). Though they have also faced several challenges with their attempt to use onX’s data from the Mountain Project with copyright infringement and blocked repositories, according to Outside Learn (3). At the moment their data is public, and Github recently reversed the DMCA takedown thanks to legal efforts from the owner, Viet Nguyen, who is “empowering the community with open license climbing betas and source tools” (1). His goal for OpenBeta is to make climbing data more like an open source project, which in turn would help platforms like Mountain Project to increase their recommendation systems, geolocation data, and the accuracy of submissions (3). In addition to pushing for accessible data, the OpenBeta also posts articles that fit the needs of any climber in STEM: tutorials, current events, and project inspirations like recommendation systems and route quality maps. The community that OpenBeta is fostering aligns heavily with the forward mentality of today’s climbing community which is: don’t be a gatekeeper, spread the beta, and anyone is capable if empowered with the right information.
      As a young climber and data scientist, I found myself incredibly inspired by OpenBeta’s work and wanted to support the nonprofit by using their data and some of their resources for my capstone project. I want to leverage climbing data to influence decision making for climbers of all skill sets and as a result, contribute to the overarching goal of OpenBeta which is to make the sport of climbing safer, more knowledgeable, and more accessible. Recommendation systems are extremely powerful and if done well, can be a great tool for young climbers when exploring outdoor routes. To get a better understanding of the data and to ensure the viability of this goal, I first performed an exploratory data analysis of the OpenBeta data.

EDA

       I knew I wanted to initially work with the West Coast for a couple of reasons. For one, the Sierra Nevada of California and the Cascade Range of the Pacific Northwest are prime western U.S. rock climbing locales. In addition the West Coast is scattered with popular climbing spots (i.e. Yosemite, Joshua Tree, and Smith Rock) but there is a common misconception that these areas only routes graded “expert.” In reality the opposite is true, and there are actually more beginner to moderate routes than expert ones.
      Furthermore, I decided to hone in on Oregon routes because it does not have widespread media coverage like California routes. Very few people know that the birthplace of sport climbing was Oregon’s Smith Rock State Park thanks to Alan Watts, a famous climbing pioneer, in 1986 (7). Not only is some of the best climbing in the nation to be found in Oregon’s Smith Rock, but there are also hidden gems right in Portland’s backyard that I had to do significant research as a climber to find. It would be even harder for a new climber to discover some of these on their own. not to mention that most of these routes are for climbers of all skill sets. The idea of my project is that anyone can have access to local classics – even in lower grades!

      We are only using routes with grades that are in the Yosemite Decimal System, which is the traditional difficulty rating for routes in the US. According to the Yosemite Decimal System, a 5.0 to 5.7 is considered easy, 5.8 to 5.10 is considered intermediate, 5.11 to 5.12 is hard, and 5.13 to 5.15 is reserved for a very elite few. This means that the app will be of use to any climber, as some local classics come in even lower grades.

       One dataset I used from OpenBeta contains all route ratings in Oregon along with the route ID, grade, name, and type (such as trad, sport, ice, or bouldering). I focused only on trad and sport routes, which are two popular forms of rope-climbing. One limit of this method is that we lose routes that could be both sport and trad, but there are only a few routes in the data that satisfy this case. Another OpenBeta dataset I used had aggregate rating data along with the location of parent walls to use for plotting. The features I used from this dataset are the parent wall ID, name, location, state and the ARQI rating (this metric is explained later).       We can see that our data is dominated by sport routes (after all, Oregon is the birth place of sport climbing). This raises a concern for bias with an Item Based Collaborative Filtering recommendation system. With that being said, sport climbing is easily the most popular form of climbing nowadays. Not only is trad climbing out of date and mostly done by the pros, it is also extremely expensive and as a result, a barrier to entry to the sport. Therefore I felt okay about this data imbalance when making a recommendation to a new user: they probably don’t want to be recommended trad routes for lower levels.       As part of my exploratory data analysis, I also wanted to get a breakdown of classic routes. As a metric for route quality, we can look at the aggregate metric RQI or ARQI. The RQI is equal to S(1-1/N) where S is the average stars (or median) and N is the number of votes. As N approaches infinity, (1-1/N) approaches 1 and RQI approaches S. One issue with this metric is that harder routes get fewer ascents and therefore less votes, making it difficult for hard routes to make it into the “classic” class. We will use the Adjusted RQI (ARQI), which corrects for the bias of RQI towards easier routes by adjusting the number of votes and therefore doesn’t make route quality a “popularity metric.” The ARQI is equal to S(1-1/Nw) where Nw is the number of weighted or adjusted votes and is determined by the votes-per-route for each grade.
      According to OpenBeta, the categories for route quality are the following:
      1. Classic: ARQI >= 3.5
      2. Area Classic: 2.5 <= ARQI < 3.5
      3. Good: 1.5 <= ARQI < 2.5
      4. Bad: 0.5 >= ARQI < 1.5
      5. Bomb: ARQI < 0.5

Class Distribution

      Note that some routes with a lower ARQI may have a higher median rating. The ARQI takes the number of votes into consideration. This allows for a more accurate and fair route designation since we don’t want just any route falling into a classic.

      At the state level, we found that a majority of routes are Area Classics with outliers in the Bad to Bomb class. But what about at the grade level?

      We find that a majority of easy and intermediate routes are both classic and area classics, which supports my claim that anyone can climb a classic not only at their local crag but additionally famous big walls climbed by the legends.

Route Mapping

      After some exploratory data analysis, I wanted to experiment with the functionality of my app. I wanted to design a route finder that would combine a recommendation and route ranking system in a map format using geolocation and ratings data. I used Plotly for interactivity, Mapbox for geocoding, and RShiny for construction of the web application. I accessed a public token from Mapbox in order to do some basic plotting and interactivity with Plotly. First I plotted all parent walls for both trad and sport and colored each wall based on route classes. Then with the idea that the user could filter routes by type, I plotted the routes by type and improved the formatting to be colored and sized by the ARQI (adjusted median rating).

Plot

      I wanted the user to be able to set filters based on their skill set and the formatting of the plot to be simple, aesthetically pleasing and effective. I decided that the map would show routes in the user’s applied filters that are colored and sized by route quality, and later on, would have a hover over that could provide more information about the wall and its routes. This makes it easier for climbers to find all the information they need about the best possible routes in their ideal range. After finding an ideal format for this route finder, I could focus on the recommender aspect of my shiny app.

Recommendations

      For the recommendation system, I created an item based collaborative-filtering recommender which asks the question “for users who climbed route x, which routes did they also climb?” and can predict routes based on past preferences of other users (1). Following OpenBeta’s structure, I wanted to make this aspect of my project a tutorial in order to provide transparency behind how recommendations are made and also a resource for future development. From what I’ve seen on famous route finders like theCrag or MP, these platforms do not implement recommendation systems for their routes (4, 5). They usually order the routes by popularity (average ratings or number of votes) but any data analyst using mean, median, or count as a metric for popularity should know to always consider outliers, skewed data, and relative proportions. In addition, I think having a simple recommendation system would be ideal for new climbers looking to find their first projects. I believe that a recommendation system combined with a map of route quality by the AQRI score also benefits the experienced climber as well. For example, if they disagree with the location, rating or quality assessment of a certain route and as a result, a failed recommendation to the climber, the user can enter more data into the Mountain Project (where OpenBeta gets its data) which they believe is more accurate. When the data funneling into the model becomes more accurate, you get a better recommendation, a better user experience, increased retention, and so on.

Item Based Recommendation Tutorial

My main reference for creating a simple item based recommendation comes from (6). Here I am taking the complete cases of my entire ratings dataset. Since recommendation systems are so computationally heavy, we should first get rid of any observations with nulls to decrease the load.

Find a Route

Next, we are going to choose a route that a climber from the Pacific Northwest may care about in order to get some recommendations based on that route. We could find the most popular route by looking at the route with the most votes, but we can now apply our new knowledge of the ARQI metric to get the route with the best quality.

or_ratings %>%
  group_by(route_id) %>%
  select(route_id, ARQI_median) %>%
  distinct() %>%
  arrange(desc(ARQI_median)) %>%
  head(3)
## # A tibble: 3 × 2
## # Groups:   route_id [3]
##   route_id  ARQI_median
##   <chr>           <dbl>
## 1 105892195        3.82
## 2 112552706        3.77
## 3 115689194        3.76
or_ratings %>% 
  filter(route_id == "105892195")  %>%
  select(route_id, route_name, grade, type, parent_sector, class, level) %>%
  slice(1)
## # A tibble: 1 × 7
##   route_id  route_name       grade type  parent_sector   class   level
##   <chr>     <chr>            <fct> <chr> <chr>           <fct>   <fct>
## 1 105892195 Spank the Monkey 5.1   sport (s) Monkey Face classic easy

The route ID with the greatest ARQI is a route called “Spank the Monkey” which is a classic, easy 5.1 sport route from the Monkey Face wall which is a very popular wall in Smith Rock. Item Based Collaborative Filtering answers the question “climbers who climbed Spank the Monkey also climbed…?”        Monkey Rock at Smith Rock State Park in Oregon. Monkey Rock at Smith Rock State Park in Oregon

Create User-Product Matrix

Now we spread out our users, route ID, and ratings across a pivot table that we convert to a simple matrix, called the user-product matrix, for calculating similarity scores.

or_wide <- or_ratings %>%
  select(users, route_id, ratings) %>%
  distinct() %>%
  pivot_wider(names_from = route_id, values_from = ratings)

row.names(or_wide) <- or_wide$users
or_wide$users <- NULL
or_mat <- (as.matrix(or_wide))
or_mat[1:3, 1:3]
##      106266547 111750995 106966352
## [1,]         4        NA        NA
## [2,]         4         4         4
## [3,]         4        NA        NA

Calculate Degree of Sparsity

The issue with the user-product matrix is its degree of sparsity. With this matrix, 99% of cells lack data which is an obvious limitation to this method. However, we may be able to tackle this issue with cosine similarity.

sum(is.na(or_mat))/(ncol(or_mat) * nrow(or_mat))
## [1] 0.9905141

Use Cosine Similarity to Measure Distance

We can use the cosine similarity to measure distances versus the Euclidean distance since the cosine distance looks at directional similarity rather than magnitudinal differences. For example, a route that gets a rating of 3.0 four times and 4.0 eight times will have a 100% similarity score to a route that has one 3.0 ratings and two 4.0 ratings. If we were computing euclidean distance, this would give us a similarity of only 13%. I’m hoping that this method can be viable in tackling the data sparsity issue.

library(lsa)

route_x <- c(4, 8)
route_y <- c(1, 2)

cosine(route_x, route_y) # cosine
##      [,1]
## [1,]    1
1/(1 + sqrt((1-4)^2 + (2-8)^2)) # euclidean
## [1] 0.1297319

Compute

The next step is to build a function to compute the similarity for various routes in our matrix.

cos_similarity = function(A,B){
  num = sum(A *B, na.rm = T)
  den = sqrt(sum(A^2, na.rm = T))*sqrt(sum(B^2, na.rm = T)) 
  result = num/den

  return(result)
}

Product-Product matrix

Now we can apply this function to obtain the product-product matrix. To prevent memory overload, we create a function to calculate the similarity for one route at a time.

route_recommendation = function(route_id, rating_matrix = or_mat, n_recommendations = 5){

  route_index = which(colnames(rating_matrix) == route_id)

  similarity = apply(rating_matrix, 2, FUN = function(y) 
                      cos_similarity(rating_matrix[,route_index], y))

  recommendations = tibble(ID = names(similarity), 
                               similarity = similarity) %>%
    filter(ID!= route_id) %>% 
    top_n(n_recommendations, similarity) %>%
    arrange(desc(similarity)) 

  return(recommendations)

}

Get Recommendations

Our function returns the top 5 similar routes to “Spank the Monkey.”

my_route <- "105892195"
recommendations = route_recommendation(my_route)
recommendations
## # A tibble: 5 × 2
##   ID        similarity
##   <chr>          <dbl>
## 1 106206822      0.302
## 2 105830103      0.261
## 3 108009452      0.251
## 4 105892160      0.246
## 5 110905373      0.244

Join with Ratings Data

Now we can join back to our original data to get information about the recommended routes. We now have information about similar routes climbed by users that also climbed and rated Spank the Monkey in a similar way. The most similar route to Spank the Monkey is The Conspiracy which is located at the Red Wall, another popular big wall in Smith Rock.

## # A tibble: 5 × 13
##   ID     name   similarity grade type  state sector_ID parent_sector   lon   lat
##   <chr>  <chr>       <dbl> <fct> <chr> <chr>     <dbl> <chr>         <dbl> <dbl>
## 1 10620… The C…      0.302 5.11b sport Oreg… 108302895 (3) Red Wall  -122.  45.5
## 2 10583… Suici…      0.261 5.10… sport Oreg… 105789050 (a) Picnic L… -121.  44.4
## 3 10800… Pouch…      0.251 5.13… sport Oreg… 105789295 (d) Aggro Gu… -121.  44.4
## 4 10589… The P…      0.246 5.4   trad  Oreg… 105892157 (tt) Mendenh… -121.  44.4
## 5 11090… Johnn…      0.244 5.11b trad  Oreg… 105865366 (4) Star Wall -121.  44.4
## # … with 3 more variables: ARQI_median <dbl>, class <fct>, level <fct>

      In the Shiny App, I wanted the user to be able to explore the parent walls of the recommended routes so that they could also see the proximity of like-routes. To build upon this method, one could also implement machine learning frameworks such as K-Nearest Neighbors to compute the cosine similarity more accurately with cross validation and hyperparameter tuning. My primary focus on the first rollout of my Shiny app was the user experience, not the model building, but this is an ongoing project that I plan to keep building upon and improving. ## Shiny App      With the proof of concept complete I was able to bring all the working pieces together in constructing my Shiny app. As seen below, the user can apply basic filters such as the grade range and type to get a quick map of available routes. When the user hovers over a point on the map, they get an overview of the parent wall which includes the name and location of the wall and information about an example route at the wall. If they click the parent wall, they can get a list of all the routes at the wall with more granularity. The table below includes additional metadata such as the number of votes, the grade class, the level class and more for each route. It is also ordered from highest rating to lowest which makes it easy for users to find the top routes at each parent sector.

knitr::include_url("https://nhernandez.shinyapps.io/climbing_app/", height = "1000px")

Link: https://nhernandez.shinyapps.io/climbing_app/

     The incorporation of the route recommender is my favorite part of the Shiny app. If the user clicks on a row of the main table, a second table popups to the right of the plot with the top five recommendations for that route. This mini table also has metadata about the various routes and specifically the parent wall where each route is located. With this information the user can enter individual parent walls (see the “Enter a Parent Wall” input option on the top left panel) for plotting, which allows the users to see where recommended routes are located relative to the initial parent wall they clicked on the map.
     I wanted to make the functionality of the app as user friendly as possible. I wanted to match the natural intuition of user experience by making the app fluid and continuous, but I also wanted to provide guidance if a user gets stuck. There are help buttons that provide an overview of class and grade distributions that were explained earlier in this article. This gives the user a solid place to start for navigating routes if they’re unsure of what they want. Notice, however, that the help buttons have to be clicked in order to trigger the pop-ups. I don’t want to “scare” the user away with information overload by covering the entire app with text. I also don’t want users to feel like there is a wrong place to start – I want the experience to be their own and freely structured while still providing any information the user might need.
     Putting my technical skills into a topic that I’m passionate about really allowed me to construct an app that I feel was effective and multidimensional. I was inspired by OpenBeta’s work to produce a project that aligns with making the sport of rock climbing more accessible as well as supporting open source projects in software development. I feel that my unique experiences as a data scientist as well as a climber have allowed me to give back to both communities with this project. While my project is, at present, only a resource for the Oregon climbing community, I believe that it could be scaled out and be equally useful nationally and even globally; wherever there are rocks and mountains, and people who yearn to climb them. I’m excited to see where this sport goes in the coming years and I hope to reflect such change in my project.

Conclusion

     One limitation of my project is the restriction placed on the data I’m using. Following a legal battle with onX regarding a copyright infringement, which OpenBeta won (as noncommercial and educational factual data cannot be copyrighted), OpenBeta is working to release their data under a public domain, permissive license (1,3). For this reason, all of OpenBeta’s current datasets only include user ratings up to 2020, and it doesn’t seem like they will be able to update them by August. Therefore, there is a missed opportunity for generating better recommendations without input following the increase of climbers after Tokyo. In the future, climbing data from the Mountain Project will be streamed directly into the model prior to rollouts and will allow for optimal and more accurate results or recommendations.
      My route finder is a free tool to the Oregon climbing community which streamlines the route searching process for climbers of varying skill sets. I also hope that this post is a resource that any stakeholder (like participants, spectators, media, sponsors, businesses, brands, developers, analysts etc.) could benefit from. Guiding the recent explosion of climbers properly could help make the sport extremely profitable and the climbing community greater and diverse. I hope to help dissolve barriers to entry by providing a tool to climbers that keeps them safe, in the loop, and connected to other climbers. For the data community, I also want to promote the open source movement in software and data as I strongly believe that it is essential in encouraging innovation, attracting diverse talent, and broadening perspectives within tech.

  1. https://OpenBeta.io/
  2. https://www.forbes.com/sites/michellebruton/2021/11/24/interest-in-climbing-and-gym-memberships-have-spiked-following-sports-tokyo-olympics-debut/?sh=3daaf24326a8
  3. https://www.climbing.com/news/mountain-project-OpenBeta-and-the-fight-over-climbing-data-access/
  4. https://www.thecrag.com/
  5. https://www.mountainproject.com/
  6. https://anderfernandez.com/en/blog/how-to-code-a-recommendation-system-in-r/
  7. https://www.climbing.com/videos/pioneering-smith-rock-alan-watts-and-the-birth-of-us-sport-climbing/